AITopics | log prob

Collaborating Authors

log prob

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Multi-Level Explanations for Generative Language Models

Paes, Lucas Monteiro, Wei, Dennis, Do, Hyo Jin, Strobelt, Hendrik, Luss, Ronny, Dhurandhar, Amit, Nagireddy, Manish, Ramamurthy, Karthikeyan Natesan, Sattigeri, Prasanna, Geyer, Werner, Ghosh, Soumya

arXiv.org Artificial IntelligenceMar-21-2024

Perturbation-based explanation methods such as LIME and SHAP are commonly applied to text classification. This work focuses on their extension to generative language models. To address the challenges of text as output and long text inputs, we propose a general framework called MExGen that can be instantiated with different attribution algorithms. To handle text output, we introduce the notion of scalarizers for mapping text to real numbers and investigate multiple possibilities. To handle long inputs, we take a multi-level approach, proceeding from coarser levels of granularity to finer ones, and focus on algorithms with linear scaling in model queries. We conduct a systematic evaluation, both automated and human, of perturbation-based attribution methods for summarization and context-grounded question answering. The results show that our framework can provide more locally faithful explanations of generated outputs.

explanation, log prob, scalarizer, (16 more...)

arXiv.org Artificial Intelligence

2403.14459

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Texas (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.47)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.46)

Add feedback

Efficient high-dimensional maximum entropy modeling via symmetric partition functions

Neural Information Processing SystemsMar-14-2024, 02:13:37 GMT

Maximum entropy (MaxEnt) modeling is a popular choice for sequence analysis in applications such as natural language processing, where the sequences are embedded in discrete, tractably-sized spaces. We consider the problem of applying MaxEnt to distributions over paths in continuous spaces of high dimensionality-- a problem for which inference is generally intractable. Our main contribution is to show that this intractability can be avoided as long as the constrained features possess a certain kind of low dimensional structure. In this case, we show that the associated partition function is symmetric and that this symmetry can be exploited to compute the partition function efficiently in a compressed form. Empirical results are given showing an application of our method to learning models of high-dimensional human motion capture data.

partition function, probability, trajectory, (14 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.62)

Add feedback

Experts Don't Cheat: Learning What You Don't Know By Predicting Pairs

Johnson, Daniel D., Tarlow, Daniel, Duvenaud, David, Maddison, Chris J.

arXiv.org Artificial IntelligenceFeb-13-2024

Identifying how much a model ${\widehat{p}}_{\theta}(Y|X)$ knows about the stochastic real-world process $p(Y|X)$ it was trained on is important to ensure it avoids producing incorrect or "hallucinated" answers or taking unsafe actions. But this is difficult for generative models because probabilistic predictions do not distinguish between per-response noise (aleatoric uncertainty) and lack of knowledge about the process (epistemic uncertainty), and existing epistemic uncertainty quantification techniques tend to be overconfident when the model underfits. We propose a general strategy for teaching a model to both approximate $p(Y|X)$ and also estimate the remaining gaps between ${\widehat{p}}_{\theta}(Y|X)$ and $p(Y|X)$: train it to predict pairs of independent responses drawn from the true conditional distribution, allow it to "cheat" by observing one response while predicting the other, then measure how much it cheats. Remarkably, we prove that being good at cheating (i.e. cheating whenever it improves your prediction) is equivalent to being second-order calibrated, a principled extension of ordinary calibration that allows us to construct provably-correct frequentist confidence intervals for $p(Y|X)$ and detect incorrect responses with high probability. We demonstrate empirically that our approach accurately estimates how much models don't know across ambiguous image classification, (synthetic) language modeling, and partially-observable navigation tasks, outperforming existing techniques.

calibration, learning, probability, (16 more...)

arXiv.org Artificial Intelligence

2402.08733

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Portugal > Porto > Porto (0.04)
Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)

Genre: Research Report > Experimental Study (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

A Bias-Variance Decomposition for Ensembles over Multiple Synthetic Datasets

Räisä, Ossi, Honkela, Antti

arXiv.org Artificial IntelligenceFeb-6-2024

Recent studies have highlighted the benefits of generating multiple synthetic datasets for supervised learning, from increased accuracy to more effective model selection and uncertainty estimation. These benefits have clear empirical support, but the theoretical understanding of them is currently very light. We seek to increase the theoretical understanding by deriving bias-variance decompositions for several settings of using multiple synthetic datasets. Our theory predicts multiple synthetic datasets to be especially beneficial for high-variance downstream predictors, and yields a simple rule of thumb to select the appropriate number of synthetic datasets in the case of mean-squared error and Brier score. We investigate how our theory works in practice by evaluating the performance of an ensemble over many synthetic datasets for several real datasets and downstream predictors. The results follow our theory, showing that our insights are also practically relevant.

dataset, prob, synthetic dataset, (15 more...)

arXiv.org Artificial Intelligence

2402.03985

Country:

North America > United States > California (0.05)
North America > United States > Wisconsin (0.04)
Europe > Russia (0.04)
(3 more...)

Genre: Research Report > New Finding (0.69)

Industry: Information Technology > Security & Privacy (0.92)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

Has Your Pretrained Model Improved? A Multi-head Posterior Based Approach

Aboagye, Prince, Zheng, Yan, Wang, Junpeng, Saini, Uday Singh, Dai, Xin, Yeh, Michael, Fan, Yujie, Zhuang, Zhongfang, Jain, Shubham, Wang, Liang, Zhang, Wei

arXiv.org Artificial IntelligenceJan-15-2024

The emergence of pre-trained models has significantly impacted Natural Language Processing (NLP) and Computer Vision to relational datasets. Traditionally, these models are assessed through fine-tuned downstream tasks. However, this raises the question of how to evaluate these models more efficiently and effectively. In this study, we explore a novel approach where we leverage the metafeatures associated with each entity as a source of worldly knowledge and employ entity representations from the models. We propose using the consistency between these representations and the meta-features as a metric for evaluating pre-trained models. Our method's effectiveness is demonstrated across various domains, including models with relational datasets, large language models, and image models. Pre-training on large models is becoming increasingly common in various machine learning applications, thanks to the growing amount of user-generated content. This is evident in areas such as Natural Language Processing (NLP) with models like GPT (Generative Pretrained Transformer), and in the vision-language domain with models like CLIP. Typically, the effectiveness of these models is evaluated using downstream tasks. However, these can be relatively costly if all tasks need to be performed.

dataset, vit-b-16 map, vit-b-32 probe acc, (12 more...)

arXiv.org Artificial Intelligence

2401.02987

Country:

North America > United States > New Jersey (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > France (0.04)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

How do Language Models Bind Entities in Context?

Feng, Jiahai, Steinhardt, Jacob

arXiv.org Artificial IntelligenceOct-26-2023

To correctly use in-context information, language models (LMs) must bind entities to their attributes. For example, given a context describing a "green square" and a "blue circle", LMs must bind the shapes to their respective colors. We analyze LM representations and identify the binding ID mechanism: a general mechanism for solving the binding problem, which we observe in every sufficiently large model from the Pythia and LLaMA families. Using causal interventions, we show that LMs' internal activations represent binding information by attaching binding ID vectors to corresponding entities and attributes. We further show that binding ID vectors form a continuous subspace, in which distances between binding ID vectors reflect their discernability. Overall, our results uncover interpretable strategies in LMs for representing symbolic knowledge in-context, providing a step towards understanding general in-context reasoning in large-scale LMs.

binding id, intervention, vector, (13 more...)

arXiv.org Artificial Intelligence

2310.17191

Country:

Europe > France (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)
Europe > United Kingdom > Scotland (0.04)
(2 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback